Regularized (bridge) logistic regression for variable selection based on ROC criterion
نویسندگان
چکیده
It is well known that the bridge regression (with tuning parameter less or equal to 1) gives asymptotically unbiased estimates of the nonzero regression parameters while shrinking smaller regression parameters to zero to achieve variable selection. Despite advances in the last several decades in developing such regularized regression models, issues regarding the choice of penalty parameter and the computational methods for models fitting with parameter constraints even for bridge linear regression are still not resolved. In this article, we first propose a new criterion based on an area under the receiver operating characteristic (ROC) curve (AUC) to choose the appropriate penalty parameter as opposed to the conventional generalized cross–validation criterion. The model selected by the AUC criterion is shown to have better predictive accuracy while achieving sparsity simultaneously. We then approach the problem from a constrained parameter model and develop a fast minorization-maximization (MM) algorithm for non-linear optimization with positivity constraints for model fitting. This algorithm is further applied to bridge regression where the regression coefficients are constrained with p-norm with the level of p selected by data for binary responses. Examples of prognostic factors and gene selection are presented to illustrate the proposed method.
منابع مشابه
Simulation-based Regularized Logistic Regression
In this paper, we develop a simulation-based framework for regularized logistic regression, exploiting two novel results for scale mixtures of normals. By carefully choosing a hierarchical model for the likelihood by one type of mixture, and implementing regularization with another, we obtain new MCMC schemes with varying efficiency depending on the data type (binary v. binomial, say) and the d...
متن کاملC# .NET Algorithm for Variable Selection Based on the Mallow’s Cp Criterion
Variable selection techniques are important in statistical modeling because they seek to simultaneously reduce the chances of data overfitting and to minimize the effects of omission bias. The Linear or Ordinary Least Squared regression model is particularly useful in variable selection because of its association with certain optimality criterions. One of these is the Mallow’s Cp Criterion whic...
متن کاملRegularized ROC method for disease classification and biomarker selection with microarray data
MOTIVATION An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease classification. Thus there is a need for developing statistical methods that can efficiently use such high-throughput genomic data, select biomarkers with discriminant power and construct classification rules. The ROC (receiver operator characteristic) tech...
متن کاملVariable Selection in ROC Regression
Regression models are introduced into the receiver operating characteristic (ROC) analysis to accommodate effects of covariates, such as genes. If many covariates are available, the variable selection issue arises. The traditional induced methodology separately models outcomes of diseased and nondiseased groups; thus, separate application of variable selections to two models will bring barriers...
متن کاملBayesian sample size estimation for logistic regression
The paper1 is devoted to the logistic regression analysis [1], applied to classification problems in biomedicine. A group of patients is investigated as a sample set; each patient is described with a set of features, named as biomarkers and is classified into two classes. Since the patient measurement is expensive the problem is to reduce number of measured features in order to increase sample ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009